A quantile regression approach to define optimal ecological niche (habitat suitability) of cockle populations (Cerastoderma edule)

SUPPLEMENTARY DATA
Author

Amélie Lehuen, Chloé Dancie, Florent Grasso, Francis Orvain

Published

February 2, 2024

1 Introduction

2 Materials and Methods

All data processing was conducted in R version 4.3.0 (2023-04-21 ucrt) except for MARS 3D pre-treatment in Matlab 2019a. Significance levels are p < .0001 with “****”, p < .001 with “***”, p < .01 with “**”, p < .05 with “*”.

2.1 Study area

Several historically known areas of the Seine estuary with different habitats or communities were studied, mainly mudflats and subtidal areas (Figure 2.A).

SuppFig 2.A: Maps of habitats areas defined in the study area. Dots represent the location of the biological samples

2.2 Biological model

2.3 Datasets

2.3.1 Biological data

The raw data (n= 50,948) were harmonised and grouped in a single database which contain a total of 31,079 observations, and 187 sampling stations (with some variation in coordinates from year to year), with an average of 87 stations sampled in each campaign, mainly in Oct, Sep, Nov. A series of 5-year periods was chosen among the periods covered by the dataset, from 2000 to 2019 (the years before 2000 were discarded as they contained too few observations, n=216), with 1 to 2 sampling campaigns per year: 2000-2005, including the construction of ‘Port 2000’ which caused major disruptions in the estuary; 2006-2010; 2011-2015; 2016-2019. 627 different species are contained in the records.

2.3.2 Hydro-Morpho-Sedimentary data

The All Seine Bay MARS 3D model was calculated using a grid of 78,400 cells (700 by 112), focused on the Seine Estuary with a mesh of 5,184 cells (103 by 68), where the cells were on average 100m long, with the coordinates of Longitude -0.0361 to 0.3217 and Latitude 49.3382 to 49.5488. Each area defined a group with several cells (North Upstream Mudflat: n= 158, North Median Mudflat: n= 713, North Downstream Mudflat: n= 733, South Mudflat: n= 1998, Ilot Oiseaux: n= 23, Channel: n= 1290, Cote Fleurie: n= 133, OffShore: n= 2554, Octeville: n= 100).

The HMS model cells corresponding to the location of the stations in the biological dataset totalled 157 with the following distribution: Channel, n=6; Cote Fleurie, n=4; Ilot Oiseaux, n=1; North Downstream Mudflat, n=9; North Median Mudflat, n=39; North Upstream Mudflat, n=6; Octeville, n=13; OffShore, n=46; South Mudflat, n=33. The positions in the tidal area are: upper intertidal ([0-25%[ immersion time, n= 19), medium intertidal ([25-75%[ immersion time, n= 41), lower intertidal ([75-100%[ immersion time, n= 48), subtidal (100% immersion time, n= 93).

2.4 Models adjustments

2.4.1 Quantile regression

The three model types were computed with different quantiles = [0.5, 0.9, 0.95, 0.975], 0.5 being the equivalent of an OLS regression, and kept as a reference, the other values higher than 0.9 to seek for the optimum response.

2.4.2 Model Validation

3 Results

3.1 Description of the biological data set

Biological data for C. edule comprised a total of n= 543 observations. The observations were split into periods as 2000-2005 (n= 108), 2006-2010 (n= 155), 2011-2015 (n= 174), 2015-2019 (n= 106). The following treatment only focussed on the mudflats used by C. edule (South Mudflat (n= 218), North Median Mudflat (n= 198), North Downstream Mudflat (n= 82), North Upstream Mudflat (n= 2)).

SuppFig 3.A: C. edule population biomass [gAFDW/m²] and density [ind/m²] in the Seine estuary, by period for each area

3.2 Selection of the Hydro-Morpho-Sedimentary factors and their association

SuppFig 3.B: Screeplot for abiotic factors

SuppFig 3.C: PCA for abiotic factors

SuppFig 3.D: PCA for abiotic factors

SuppFig 3.E: Correlation plot of all variables extracted from MARS3D dataset corresponding to biomass and densities from biological dataset. Correlation scores are shown above the diagonal, scatter plots with a linear regression below the diagonal, while the diagonal itself shows the density plot of each variable.
Correlation scores
Abiotic variables
variable current speed daily maximum current speed inundation time salinity daily salinity range temperature daily temperature range MES mud bathymetry yearly sediment budget bed shear stress daily maximum bed shear stress sediment total concentration
current speed
daily maximum current speed 0.73****
inundation time 0.74**** 0.56****
salinity 0.38**** 0.11* 0.58****
daily salinity range 0.11* 0.30**** 0.07 -0.59****
temperature 0.36**** 0.31**** 0.52**** 0.46**** 0.02
daily temperature range -0.60**** -0.30**** -0.75**** -0.55**** 0.11* -0.52****
MES mud 0.22**** 0.14** 0.28**** 0.34**** -0.16** 0.15** -0.33****
bathymetry 0.81**** 0.59**** 0.89**** 0.61**** -0.08 0.43**** -0.73**** 0.26****
yearly sediment budget 0.06 0.09 -0.01 0.01 0.04 0.00 0.05 0.05 0.04
bed shear stress 0.46**** 0.27**** 0.63**** 0.54**** -0.14** 0.40**** -0.50**** 0.60**** 0.54**** -0.07
daily maximum bed shear stress -0.24**** 0.02 -0.21**** -0.07 -0.06 -0.15** 0.21**** 0.19**** -0.23**** 0.10* -0.06
sediment total concentration -0.02 -0.04 -0.16** -0.02 -0.08 -0.08 0.23**** 0.03 -0.19**** -0.20**** 0.16** 0.17***
mud content 0.04 -0.06 0.15** 0.06 -0.02 0.05 -0.22**** -0.02 0.21**** 0.17*** -0.10* -0.29**** -0.77****

3.3 Description of the Hydro-Morpho-Sedimentary data

The selected predictors were observed in the same period and in the same area as the biological data (Figure 3.F). Generally speaking, all the factors differed significantly in area and period:

  • Daily max current speed [m.s^-1]: mean of 1.05 +/- 0.21 (upstream 0.43 +/- 0.34; median 0.63 +/- 0.3).

  • Inundation time [%]

  • Daily salinity range

  • Temperature [°C]

  • Mud content [%]: Sandy mud (north upstream 42.43 +/- 30.48, channel 42.63 +/- 25.44). Muddy sand at 20.84 +/- 9.48.

  • Daily max bed shear stress [N.m-2]: channel BSS mean (3.02 +/- 1.36), BSS in the other areas (1.66 +/- 0.57).

  • Bathymetry [m]: The depth of the channel and offshore mean range of 6.94-7.97. The north downstream mudflat and the south mudflat were the next deepest areas (4.18 +/- 0.95 ), the median mudflat was 1.77 +/- 3.3, and north upstream mudflat was the shallowest, -1.58 +/- 2.64 (negative bathymetry being above the mean height of sea water).

SuppFig 3.F: Mars3D selected factors maps in the Seine estuary

SuppFig 3.G: Mars3D selected factors means by areas and periods in the Seine estuary

SuppFig 3.H: Mars3D selected factors density plot in the Seine estuary

3.4 Methodology assessment

SuppFig 3.I: ∆AICc comparison for all SDMs computed, according to the quantile, the type of model and the response.

SuppFig 3.J: Example of modelled vs observed data plotted for each model based on biomass under set A. The black line represents the 1:1 ratio.

SuppFig 3.K: Examples of SDM surface plots under set A: linear with interaction (A), Gaussian non-linear (B) and Cubic B-Spline linear (C). The upper panel shows the surfaces side by side, the lower panel shows the 3D plots with all processed quantiles superposed. The biological data observed with this model are represented by a contour plot, the data over the model are represented by red stars.
SuppFig 3.L: SDM interactive 3D plots
SuppFig 3.M: SDM interactive 3D plots
SuppFig 3.N: SDM interactive 3D plots

3.5 Species Distribution Models – Optimal Ecological Niche (SDM)

3.5.1 Comparison of linear and nonlinear Quantile Regression

The four sets (A, B, C and D) were treated with one of the two selected models: either linear with interaction, or non-linear with a bifactorial Gaussian equation, with biomass as biological response (with predict/observed plot, SDMs with the distribution of HMS data, RQ2nli SDMs were represented in 3D graph in Section 3.5.1.3).

  1. Daily max current speed [m.s-1] and inundation time [%]:
    1. RQ2int optimum was 94.82gAFDW/m², at 0 m.s-1 and 100 %; RQ2nli optimum was 233gAFDW/m² at 0 m.s-1 and 100 %.

    2. RQ2int optimum was 2129.29ind/m² at 0 m.s-1 and 100 %; RQ2nli optimum was 2195.67ind/m² at 0.43 m.s-1 and 100 %.

  2. Daily salinity range and daily temperature range [degC]:
    1. RQ2int optimum was 68.96gAFDW/m² at 0.2 u.s.i and 13.01 degC; RQ2nli optimum was 91.35gAFDW/m² at 0.2 u.s.i and 12.35 degC.

    2. RQ2int optimum was 1962.03ind/m² at 24.4 u.s.i and 13.01 degC; RQ2nli optimum was 980.01ind/m² at 0.2 u.s.i and 13.01 degC.

  3. Daily salinity range and bathymetry [m]:
    1. RQ2int optimum was 172.64gAFDW/m² at 0.2u.s.i and 13.95 m; RQ2nli optimum was 147.6gAFDW/m² at 0.2 u.s.i and 5.14 m.

    2. RQ2int optimum was 3792.69ind/m² at 0.2 u.s.i and 13.95 m; RQ2nli optimum was 4160.29ind/m² at 0.2 u.s.i and 9.02 m.

  4. Mud content [%] and bed shear stress [N.m-2]:
    1. RQ2int optimum was 104.58gAFDW/m² at 100 % and 0 N.m-2; RQ2nli optimum wasf 106.94gAFDW/m² at 35 % and 1.33 N.m-2.

    2. RQ2int optimum was 2990.53ind/m² at 100 % and 0 N.m-2; RQ2nli optimum was 1545.55ind/m² at 100 % and 0.44 N.m-2.

3.5.1.1 Linear and Gaussian SDM visualisation

Sets computed with linear model (top row, numbered 1) and non-linear with a Gaussian equation (bottom row, numbered 2), the observed biological data under the model surface are represented by an isometric curve, the data over the model are represented by red stars. Each pair has its own scale to ensure visibility

Sets computed with linear model (top row, numbered 1) and non-linear with a Gaussian equation (bottom row, numbered 2), the observed biological data under the model surface are represented by an isometric curve, the data over the model are represented by red stars. Each pair has its own scale to ensure visibility

3.5.1.2 Linear and Gaussian SDM modelled vs observed plot

Modelled vs observed data plot for RQ2int and RQ2nli SDM. Black line represents the 1:1 ratio

Modelled vs observed data plot for RQ2int and RQ2nli SDM. Black line represents the 1:1 ratio

3.5.1.3 Interactive 3D SDM visualisation

Only RQ2nli SDM were represented, with all quantiles representation :

3.5.2 Non-linear quantile regression with bifactorial Gaussian equation models

3.5.2.1 Application maps

SDM models applied on the Seine estuary over the five periods in suitability index.

SDM models applied on the Seine estuary over the five periods in potential biomass.

SDM models applied on the Seine estuary over the five periods in potential density.

3.5.2.2 Suitability index

Abiotic factors and suitability index per period and per area for all SDM models with a 95% confidence interval.

Abiotic factors and suitability index per period and per area for all SDM models with a 95% confidence interval.

4 Supplementary data

4.1 Session information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  French_France.utf8
 ctype    French_France.utf8
 tz       Europe/Paris
 date     2024-02-02
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
 quarto   1.3.353 @ C:\\PROGRA~1\\RStudio\\RESOUR~1\\app\\bin\\quarto\\bin\\quarto.exe

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 beepr        * 1.3        2018-06-04 [1] CRAN (R 4.3.1)
 boot         * 1.3-28.1   2022-11-22 [1] CRAN (R 4.3.0)
 broom        * 1.0.5      2023-06-09 [1] CRAN (R 4.3.1)
 colorspace   * 2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 conflicted   * 1.2.0      2023-02-01 [1] CRAN (R 4.3.1)
 data.table   * 1.14.8     2023-02-17 [1] CRAN (R 4.3.1)
 dplyr        * 1.1.3      2023-09-03 [1] CRAN (R 4.3.1)
 explor       * 0.3.10     2023-04-29 [1] CRAN (R 4.3.2)
 factoextra   * 1.0.7      2020-04-01 [1] CRAN (R 4.3.2)
 FactoMineR   * 2.9        2023-10-12 [1] CRAN (R 4.3.2)
 figpatch     * 0.2        2022-05-03 [1] CRAN (R 4.3.1)
 forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.3.1)
 geomtextpath * 0.1.1      2022-08-30 [1] CRAN (R 4.3.2)
 GGally       * 2.1.2      2021-06-21 [1] CRAN (R 4.3.1)
 gginnards    * 0.1.2      2023-05-24 [1] CRAN (R 4.3.1)
 ggplot2      * 3.4.3      2023-08-14 [1] CRAN (R 4.3.1)
 ggplotify    * 0.1.2      2023-08-09 [1] CRAN (R 4.3.2)
 ggpmisc      * 0.5.5      2023-11-15 [1] CRAN (R 4.3.2)
 ggpp         * 0.5.6      2024-01-09 [1] CRAN (R 4.3.2)
 ggpubr       * 0.6.0      2023-02-10 [1] CRAN (R 4.3.1)
 ggridges     * 0.5.4      2022-09-26 [1] CRAN (R 4.3.1)
 ggsci        * 3.0.0      2023-03-08 [1] CRAN (R 4.3.1)
 grafify      * 4.0        2023-10-07 [1] CRAN (R 4.3.1)
 gt           * 0.10.0     2023-10-07 [1] CRAN (R 4.3.1)
 here         * 1.0.1      2020-12-13 [1] CRAN (R 4.3.1)
 Hmisc        * 5.1-1      2023-09-12 [1] CRAN (R 4.3.1)
 htmlwidgets  * 1.6.2      2023-03-17 [1] CRAN (R 4.3.1)
 introdataviz * 0.0.0.9003 2023-10-09 [1] Github (psyteachr/introdataviz@0519c98)
 kableExtra   * 1.3.4      2021-02-20 [1] CRAN (R 4.3.1)
 knitr        * 1.44       2023-09-11 [1] CRAN (R 4.3.1)
 lattice      * 0.21-8     2023-04-05 [1] CRAN (R 4.3.0)
 lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 openxlsx     * 4.2.5.2    2023-02-06 [1] CRAN (R 4.3.1)
 patchwork    * 1.1.3      2023-08-14 [1] CRAN (R 4.3.1)
 permute      * 0.9-7      2022-01-27 [1] CRAN (R 4.3.2)
 plot3D       * 1.4        2021-05-22 [1] CRAN (R 4.3.1)
 plotly       * 4.10.2     2023-06-03 [1] CRAN (R 4.3.1)
 purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
 quantreg     * 5.97       2023-08-19 [1] CRAN (R 4.3.1)
 RColorBrewer * 1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 readr        * 2.1.4      2023-02-10 [1] CRAN (R 4.3.1)
 readxl       * 1.4.3      2023-07-06 [1] CRAN (R 4.3.1)
 reshape2     * 1.4.4      2020-04-09 [1] CRAN (R 4.3.1)
 rlist        * 0.4.6.2    2021-09-03 [1] CRAN (R 4.3.1)
 rstatix      * 0.7.2      2023-02-01 [1] CRAN (R 4.3.1)
 scales       * 1.2.1      2022-08-20 [1] CRAN (R 4.3.1)
 sessioninfo  * 1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
 sf           * 1.0-14     2023-07-11 [1] CRAN (R 4.3.1)
 sfheaders    * 0.4.3      2023-06-22 [1] CRAN (R 4.3.1)
 SparseM      * 1.81       2021-02-18 [1] CRAN (R 4.3.0)
 stringr      * 1.5.0      2022-12-02 [1] CRAN (R 4.3.1)
 terra        * 1.7-46     2023-09-06 [1] CRAN (R 4.3.1)
 tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidyr        * 1.3.0      2023-01-24 [1] CRAN (R 4.3.1)
 tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.3.1)
 tmap         * 3.3-4      2023-09-12 [1] CRAN (R 4.3.1)
 tmaptools    * 3.1-1      2021-01-19 [1] CRAN (R 4.3.1)
 vegan        * 2.6-4      2022-10-11 [1] CRAN (R 4.3.2)
 wesanderson  * 0.3.6      2018-04-20 [1] CRAN (R 4.3.1)

 [1] C:/Program Files/R/R-4.3.0/library

──────────────────────────────────────────────────────────────────────────────

The Scientific colour map batlow (Crameri 2018) is used in this study to prevent visual distortion of the data and exclusion of readers with colour­vision deficiencies (Crameri, Shephard, and Heron 2020).

References

Crameri, Fabio, Grace E. Shephard, and Philip J. Heron. 2020. “The Misuse of Colour in Science Communication.” Nature Communications 11 (1): 5444. https://doi.org/10.1038/s41467-020-19160-7.